Cross-Genre Age and Gender Identification in Social Media
نویسندگان
چکیده
This paper gives a brief description on the methods adopted for the task of author-profiling as part of the competition PAN 2016 [1]. Author profiling is the task of predicting the author’s age and gender from his/her writing. In this paper, we follow a two-level ensemble approach to tackle the cross-genre author profiling task where training documents and testing documents are from different genres. We use the softvoting approach to build the classification ensemble. To include various feature sets, we first train logistic regression models using the extracted word n-gram, character n-gram, and part-of-speech n-gram features for each genre. We then ensemble single-genre predictive models trained on the blog, social media and Twitter data sources, to build our multi-genre ensemble approach. The experimental results indicate that our approach performs well in both single-genre and cross-genre author profiling tasks.
منابع مشابه
Age and Gender Identification using Stacking for Classification
This paper presents our approach of identifying the profile of an unknown user based on the activities of known users. The aim of author profiling task of PAN@CLEF 2016 is cross-genre identification of the gender and age of an unknown user. This means training the system using the behavior of different users from one social media platform and identifying the profile of other user on some differ...
متن کاملOverview of the 4th Author Profiling Task at PAN 2016: Cross-Genre Evaluations
This overview presents the framework and the results of the Author Profiling task at PAN 2016. The objective was to predict age and gender from a cross-genre perspective. For this purpose a corpus from Twitter has been provided for training, and different corpora from social media, blogs, essays, and reviews have been provided for evaluation. Altogether, the approaches of 22 participants were e...
متن کاملAuthor Profiling with Doc2vec Neural Network-Based Document Embeddings
To determine author demographics of texts in social media such as Twitter, blogs, and reviews, we use doc2vec document embeddings to train a logistic regression classifier. We experimented with age and gender identification on the PAN author profiling 2014–2016 corpora under both singleand cross-genre conditions. We show that under certain settings the neural network-based features outperform t...
متن کاملGronUP: Groningen User Profiling
We train an SVM linear model on tweets to perform user profiling, in terms of gender and age, on non-Twitter social media data, whose actual nature is unknown to us at developing time. We choose features that we deem appropriate to profile authors on social media in general, and which do not characterise the specifics of Twitter data too closely. Additionally, we pay specific attention to engin...
متن کاملProfiling Microblog Authors using Concreteness and Sentiment - Know-Center at PAN 2016 Author Profiling
The PAN 2016 author profiling task is a supervised classification problem on cross-genre documents (tweets, blog and social media posts). Our system makes use of concreteness, sentiment and syntactic information present in the documents. We train a random forest model to identify gender and age of a document’s author. We report the evaluation results received by the shared task.
متن کامل